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Understanding the subgraph distribution in random networks is important for modelUng complex 
systems. In classic Erdos networks, which exhibit a Poissonian degree distribution, the number 
of appearances of a subgraph G with n nodes and g edges scales with network size as (G) ~ 
N"'~^ . However, many natural networks have a non- Poissonian degree distribution. Here we present 
approximate equations for the average number of subgraphs in an ensemble of random sparse directed 
networks, characterized by an arbitrary degree sequence. We find new scaling rules for the commonly 
occurring case of directed scale-free networks, in which the outgoing degree distribution scales as 
P{k) ~ k Considering the power exponent of the degree distribution, 7, as a control parameter, 
we show that random networks exhibit transitions between three regimes. In each regime the 
subgraph number of appearances follows a difi^erent scaling law, (G) ~ A*'" , where a = n — g + s — 1 
for 7<2, a — n — g + s + 1 — J for 2 < 7 < 7c, and a — n — g for 7 > 7c, s is the maximal outdegree 
in the subgraph, and 7c = s + 1. We find that certain subgraphs appear much more frequently than 
in Erdos networks. These results are in very good agreement with numerical simulations. This has 
implications for detecting network motifs, subgraphs that occur in natural networks significantly 
more than in their randomized counterparts. 

PACS numbers: 05, 89.75 



I. INTRODUCTION 

Many natural systems are described as networks of in- 
teracting components ('l']-'^). Random networks have 
been studied as models of these complex systems. The 
classic model for a random network is the Erdos model 
(E9"C3l)> i'^ which each of the possible edges in the net- 
work exists with probability p. There exists an analyti- 
cal solution to many of the properties of Erdos networks, 
such as the diameter, clustering coefficient, component 
size distributions, and subgraph distributions C[l0|-[T^). 
The average number of appearances G of a subgraph with 
n nodes and g edges in a directed network of N nodes is 



(G) ^N''-9 



(1) 



assuming a fixed mean connectivity (K) = pN. A is 
a term of order 1 which stems from the symmetry of 
each subgraph. Erdos networks have been extensively 
used as models for analyzing real networks. An excellent 
example is the work of Davis, Holland and Leinhardt on 
subgraphs in social networks (T^-T?!). 

Erdos networks exhibit a Poissonian degree distribu- 
tion: the distribution of the number of edges per node 
is P{k) — (fc)''e~^'^^/fc!. Nodes with a number of edges 
much higher than the mean are exponentially rare. Many 
naturally occurring networks, on the other hand, obey a 
long-tailed degree sequence, often described as a power 
law, P{k) ~ with 7 often between 2 and 3 fp^- 

). These networks, termed scale- free networks, are 
characterized by the existence of nodes with high de- 





FIG. 1: Example of (a) Erdos network and (b) Scale-free 
network (7 = 2). Mean connectivity is 1.85 in both. Notice 
the hub in the scale-free network. 



gree, termed hubs (Fi^). The existence of hubs dramat- 
ically influences the properties of these networks. Some 
of the global properties of random networks with arbi- 
trary degree distribution, and specifically scale-free net- 
works, have been calculated. These include sizes of con- 
nected components ( Hi , I30II . [ 3ll| ) . distances (H^), per- 
colation thresholds (|33]-|33) and clustering coefficients 

(MMM)- 

There is much current interest in the local structure 
of networks (|3,0-[i,[3|-|43,|43-[43). Recently sub- 
graph structure was analyzed in biological and techno- 
logical networks(|3|,[l|)- It was found that these natural 
or designed networks contain network motifs, subgraphs 
that occur much more often than in an ensemble of ran- 
domized networks with the same degree sequence. In 
biological networks, the network motifs were suggested 
to be elementary building blocks which carry out key 
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information processing functions (0,0). In these stud- 
ies, random networks generation and the enumeration of 
their subgraphs were performed numerically. To comple- 
ment this numerical work, it would be important to the- 
oretically characterize the subgraph distribution of ran- 
dom networks. Here we present approximate formulas 
for the average number of subgraphs in an ensemble of 
random networks with an arbitrary degree sequence. In 
the random ensemble each node has a specified indegree, 
outdegree, and mutual degree. These formulas give a 
very good approximation for random networks which al- 
low for multiple edges between nodes (more than one 
edge in a given direction ), a s in the well-studied config- 
uration model (0,|23l-0)- We also show that they 
provide a reasonable approximation for networks where 
multiple edges are not allowed, which represent more re- 
alistically many naturally occurring networks. We apply 
these formulas to arrive at new scaling laws for networks 
with a scale-free degree distribution. Wc find that each 
subgraph has its own scaling exponent, influenced by its 
topology. Considering the power exponent of the degree 
distribution, 7, as a control parameter, we show that the 
random networks exhibit transitions between 3 regimes. 
In each regime the subgraph number of appearances fol- 
lows a different scaling law. We find that certain sub- 
graphs appear much more frequently than in Erdos net- 
works. 
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FIG. 2: A subgraph with one mutual edge and 4 single edges. 
The subgraph degree sequences {ki,ri,mi} and node degrees 
{Ki, Ri, Mi} are displayed in bold. Edge probabilities are 
displayed in plain. Using Eq. @ , the mean subgraph number 
of appearances in an ensemble of random networks is (G) = 
2 {K{K - 1)M) {R{R - 1)M) {RKf /N (K)* (M) 



II. NUMBER OF SUBGRAPHS, 
APPROXIMATE SOLUTION 

The following approximation assumes sparse networks 
{{K)^N). The network degree sequence is given by the 
outdegree {i^iji^i (the number of edges outgoing from 
each node), indegree {Ri}^i (the number of incoming 
edges at each node), and mutual degree {Mi}fLi (the 
number of mutual edges at each node). Mutual edges 
are cases where there is a pair of edges in both directions 
between two nodes. This property has been studied in so- 
cial networks (E3"E3) ^^e world wide web (H^). 
We begin by computing the probability of obtaining an 
n-nodc subgraph with ga single edges, gm mutual edges, 
subgraph outdegree sequence subgraph indegree 
sequence and subgraph mutual degree sequence 
{mj}^^i in a given set of nodes. Consider the example 
of (Fig 121). The probability of obtaining a directed edge 
from node 1 to node 2 is approximately 



This reasoning apphes to all the subgraph edges. The 
mean number of appearances of a subgraph is found by 
taking the average of the resulting expression with re- 
spect to all choices of n distinct nodes {ci . . . (t„}, and 
multiplying by the number of possible choices of n nodes 
out of N: 



(G) = 




{Kr (M) 



Where {K) is the average outdegree (equals the aver- 
age indegree {R))^ and (Af) is the average mutual edge 
degree. The symmetry factor a is 0'^^YYj^ikj\rj\mj\ , 
where Cq is the number of different permutations of the 
nodes that give an isomorphic subgraph. 

The average Q reduces to a product of moments of 
different orders of the indegree, outdegree and mutual 
degree distributions: 



P{edgel) 



K1R2 
N (K) 



(2) 



n 



(5) 



assuming K1R2 <^N{K) (see Appendix A). The proba- 
bility of obtaining a second edge from node 1 to node 3 
is : 



P(edge2\edge\) 



[Ki - 1) j?3 

N(K) 



(3) 



where the fact that each node should participate in the 
summation of only one term j introduces higher order 
corrections which we neglect. For example, subgraph 
idl02 (Table I), has n=3 nodes, 5a = 2 single edges and 
(7m = 1 mutual edge. The subgraph degree sequences are 



{1,1,0}, 



{0,1,1}, and rrij = {1,0,1}. Using 
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TABLE I: Mean numbers of the thirteen connected directed subgraphs in an ensemble of random networks with a given 
degree distribution3. The degree distributions are those of transcription in the yeast S. cerevisiae{ 8]), synaptic connections 
between neurons in C.elegans{:i)'^), and world-wide-web hyperlinks between web pages in a single domain( 18]). Shown are the 
theoretical values (Eq. 0. The values in parentheses are the percent deviations of the direct enumeration results - using the 
algorithms described in where 1000 random networks with the same degree distributions as those of the real networks were 
generated and all subgraphs were counted. The left value is the percent deviation in an ensemble which allows for multiple 
edges, and the right value shows the deviation for an ensemble which does not allow multiple edges. Values below 0.5 were 
rounded to zero. In subgraphs marked with *, the theoretical values shown were obtained using the correction of Appendix B to 
the table equations. Subgraph id is determined by concatenating the rows of the subgraph adjacency matrix and representing 
the resulting vector as a binary number. The id is the minimal number obtained from all the isomorphic versions of the 
subgraph. 



we find 



(G) = (idl02) 



(KM) {RM) {RK) 



(6) 



{kY{m) 

The approximation (Eq. El is exact in the case of Erdos 
networks. In Erdos networks, both indegree and outde- 
gree are Poisson distributed and independent, and Eq. 
© reduces to Eq. (QJ . 

For non-sparse networks, a more accurate approximation 
takes into account the probabihties of a non-existent edge 
between two nodes (see Appendix B). 
We tested the equations on random networks taken with 
the degree sequence of real world networks - transcrip- 
tion interactions in the yeast 5*. cerevisiae(^), synap- 
tic connections between neurons in C.efeoans(|5l|) and 
world-wide-web hyperlinks between web pages in a sin- 
gle domain(p^). When multiple edges in the same di- 
rection are allowed, as in the configuration model, the 



equations (|SJ) are within a few percent of the numeri- 
cal simulation results (Table I). We have also simulated 
random networks in which only one edge was allowed in 
each direction between any two nodes. As can be seen in 
Table I, the equations jSJ are still within a few percent 
of the numerical simulation results for most subgraphs. 
There are some discrepancies (most notably a factor of 
almost 4 for subgraph id38 in the randomized world wide 
web network). In addition, we find good agreement be- 
tween our approximation and numerical enumeration of 
subgraphs in simulated random networks with scale-free 
outdegree (Fig. 3). 



III. SCALE-FREE NETWORKS 

Scale-free networks have degree distributions that fol- 
low P{k) ~ at large k (23"li3)- We consider di- 
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FIG. 3: Subgraph numbers in 1000 random networks with 
N=2000 nodes, with scale-free outdegree and compact inde- 
gree. The outdegree of each node, Ki was picked from the 
distribution (O , with 7 = 2. The networks were constructed 
using the algorithm of Newman, Strogatz and WattsQ mod- 
ified so that only a single edge in a given direction is allowed 
between any two nodes. Theoretical number of appearance 
were computed using the degree sequences of each network 
(equations in Table I). 



rected networks in which the outgoing edge degree is 
scale-free, while the incoming edge degree distribution is 
Poissonian. Our results can be easily extended to scale- 
free indegree. For simplicity we choose the following form 
for the outgoing degree distribution for a network with 
N nodes (this function was used in Q to fit world-wide 
web data) : 



P{k) = ^L^{k + k^)-^ k<N 



(7) 



The mean connectivity (K) is determined by kg. 

The hub is the node with the maximal number of out- 
going edges, T. The hub size distribution (Fig. Q is : 



P{T) 
iV(7- 



ko 



NP{k^T)[P{k < T)] 



-(TAo)"'(l-(T/fco)- 



-7+1 



N-1 



(8) 



assuming T ^ kg. For 2 < 7 < 3, the mean hub scales 
as: 



{T) = 



N-l 



TP{T)dT - A^— 



(9) 



where the mean is over an ensemble of random networks 
with the same 7 and mean connectivity (see also p^.|3^ 
for an alternative method of deriving this result). At 
7 < 2, there is a condensation effect 50], where a finite 
fraction of the nodes have outdegree < 1, and the mean 
hub size becomes proportional to N. Using ((Sj, and as- 
suming a compact distribution for the number of mutual 
edges, we find that the subgraph distribution is domi- 
nated by the hubs, and that the dominant term is that 



c 

0) 
■D 

>. 

n 

n 
o 




100 150 200 

hub size 

FIG. 4: Simulated and theoretical hub distribution for net- 
works with N=3000 nodes, 7 = 2.2 (O) or 7 = 2.8 (□), and 
mean connectivity (K) = 1.2. Lines : theoretical calculations, 
Eq. JHJ. 



of the subgraph node with maximal outdegree, s. The 
number of appearances of each subgraph can be shown 
to scale as : 



i=i ^ ^ ^ 



TV" 



(10) 



where g ~ g„ + 2(7,„ is the total number of edges in the 
subgraph (53j. We derive the scaling exponent a in the 
following section. 



IV. TRANSITIONS AT DIFFERENT 7 

The subgraph numbers scale as 

(G) - N" (11) 

We find three different regimes, in each of which the scal- 
ing exponent a behaves differently. Taking an ensemble 
average by integrating the largest term in Eq. H10|l over 
the hub distribution ((SJ we get: 



j T''P[T)dT 



(12) 



For 7 < 2 the network is in a condensed regime, where 
the hub T = 0{N). In this regime : 



(G) - 7V"-9+"-i 



(13) 
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TABLE II: the scaling exponent a of subgraph numbers for 
random scale- free networks with outgoing degree exponent 7. 
The subgraph numbers scale as (G) ~ A'''^. Shown are all thir- 
teen 3-node connected directed subgraphs and 4 examples of 
4 node subgraphs, n is the number of nodes in the subgraph, 
g, the number of edges and s, the maximal degree within the 
subgraph. The exponent a has 3 regimes : aerdos in the 
"Erdos regime", when 7 > 7c, cisf in the "scale-free regime", 
when 2 < 7 < 7c, and a^ond in the "condensed regime", when 
7 < 2. 

For 2 < 7 < 7c substituting Eq. |Hlin Eq. [H yields : 

(G) - ]S[n-g+s-7+i (14) 

In this regime, the tail of P(T) is the dominant contri- 
bution to the integral. Finally at values above a critical 
7, another transition occurs, where a equals the scaling 
exponent in Erdos networks, a = n — g = aerdos- The 
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FIG. 5: Scaling exponent of 3-node subgraphs (a) and 4- 
node subgraphs (b) as a function of 7. The exponent a 
was obtained from the slope of a log-log fit of the num- 
ber of subgraphs vs. network size, for 9 different net- 
work sizes (30,100,300,500,1000,1500,2000,2500,3000) aver- 
aged over 5000 randomized networks for each size and out- 
degree powerlaw 7. All the networks had mean connectivity 
(K) — 1.2. The exponent a displays three regimes, 7 < 2 
(the condensed regime), 2 < 7 < 7c (the scale- free regime), 
7 > 7c (Erdos regime). 

critical 7 is jc- 

7c - s + 1 (15) 

In this regime, the hubs no longer contribute significantly 
to the subgraph distribution. In summary, (G) ~ iV", 
where a is : 

{n ~ g + s — 1 7<2 
n-,g + s- 7+ l 2<7<s + l (16) 
n — g 7>s + l 
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Table IITI shows the expected scahng exponent for the 13 
connected directed 3- node subgraphs, as weU as for sev- 
eral 4-node subgraphs. The scaling laws agree very well 
with numerical results (Fig. 5). The three regimes of 
scaling are clearly seen. Note that the topology of each 
subgraph effects its scaling, through the subgraph maxi- 
mal outdegree, s. These results can be easily extended to 
the case of scale- free indegree and non-directed networks. 
For loops of any size in non-directed networks the crit- 
ical 7 is 7c = 3. At 7 > 3, loop numbers scale as iV°. 
This is consistent with ^^l) which showed logarithmic 
corrections for the number of loops in Barabasi- Albert 
scale-free networks, which have 7 = 3. 



hy E = N (K) the total number of edges. We begin 
by calculating the probability that no edge connects a 
source node with K outgoing edges and a target node 
with R incoming edges. This happens when all K edges 
connect to a set of nodes {ci}*^^! which does not contain 
the target node: 



p{no edge\{a,}) = Yi [ I 



R 



k=0 



(Al) 



where R' is the indegree of the source node (we do not 
allow self edges). The probability of having no edge is 
obtained by summing over all possible sets {(T,;}JL]^ : 



V. DISCUSSION 

To summarize, we have presented an approximate so- 
lution for the average number of directed connected sub- 
graphs in an ensemble of random networks with arbitrary 
degree sequence. We have presented scaling formulas for 
the number of subgraphs in scale-free random networks, 
and showed that the subgraph numbers can be very dif- 
ferent from those in Erdos random networks. Whereas in 
Erdos random networks the scaling exponent is strictly 
determined by the number of nodes and edges of the sub- 
graph, in scale-free random networks, the exact topology 
of the subgraph determines the scaling exponent. We 
showed that the scaling exponent a exhibits three dif- 
ferent scaling laws in three regimes, depending on the 
control parameter 7 (the power of the degree distribu- 
tion). In the common case of scale-free networks with 7 
between 2 and 3, there are many more subgraphs which 
contain a node connected to more than one other node 
than in the corresponding Erdos networks with the same 
mean connectivity. For example, the feed- forward loop, 
(id38 in Table I) is much more common for 7 < 3. At 
7 = 2.5, the number of feed-forward-loops scales as N'~'-^, 
as opposed to in Erdos networks. On the other hand, 
subgraphs such as the 3-node cycle (id98 in Table I) have 
the same scaling, N'^, as in Erdos networks. 

This study adds to our understanding of the random 
network models to which real-world networks are com- 
pared. It highlights the importance of using random 
networks that preserve the single and mutual degree se- 
quence of the real network. Our approach may be read- 
ily extended to networks with multiple colors of edges. 
The present results may be useful for enumerating sub- 
graphs in very large random networks which are beyond 
the reach of current numerical algorithms. 



APPENDIX A: EDGE PROBABILITIES 

Here we give a more detailed derivation for the edge 
probabilities used in Eq. (|2I3|) . Without loss of generality 
we treat a network with no mutual edges. We denote 



p(no e 
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^^e) = E n 1 

^^■\ K I {a} k=0 \ 
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E-R'- Eti R'T, 
(A2) 



Assuming max^jLj^ ^ E, and taking the comple- 
ment as the probability of an edge existing, we obtain: 



p{edge) = !-(!- 



R 



N(K) 



K 



'KR/N(K) 



KR 



N{K) 
(A3) 

where our last approximation assumes KR<^N{K). In- 
tuitively, this result can be understood as K attempts 
for the source node to connect to the target node with 
a probability of R/N{K) at each attempt. R/N{K) is 
the probability of an arbitrary edge connecting into the 
target node. Pairs of nodes in which KR is of the order 
of N{K) will contribute multiple edges in the same di- 
rection in the approximation, leading to over-estimation 
of subgraph numbers in the simulated networks where 
multiple edges are not allowed (Table I). 



APPENDIX B: NON-SPARSE NETWORKS 

In calculating the number of appearances of subgraphs 
in non-sparse networks, a more accurate approximation 
takes into account the probabilities of a non-existent edge 
between two nodes. For such subgraphs, in addition to 
the specified subgraph, Eq. © counts a set of subgraphs, 
with the null edges replaced by single or mutual edges. 
The corrections for the 3-node subgraphs are : 
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TABLE III: Matrix formulas for the numbers of all 3-node 
connected directed subgraphs. M is the adjacency matrix, 
5 is its symmetric component, and A its asymmetric compo- 
nent. A' is the transposed matrix, A is the logical inverse of 
matrix A, trA is the matrix trace. 



where (G) represents the values obtained from Eq. 10), 
and {G*) is the corrected value. Generally, for larger 
subgraphs the corrections made will be of an inclusion- 
exclusion type. 

APPENDIX C: SUBGRAPH ENUMERATION 

In numerically enumerating the subgraphs we combine 
a dynamic programming method(0|), which is applied 



generally for n-node subgraphs with n > 4, and a more 
rapid calculation, based on adjacency matrix operations, 
used for 3-node subgraphs. The method generalizes the 
results of ('S^). Here we give formulas for the thirteen 
3-node connected directed subgraphs based on the adja- 
cency matrix. The network adjacency matrix is denoted 
by M, where My = 1 if a directed edge exists from node 
i to node j . We begin by dividing the network into a net- 
work containing only antisymmetric arrows, whose adja- 
cency matrix will be denoted by A, and a network con- 
taining only mutual arrows, whose symmetric adjacency 
matrix will be denoted as S . 

M = A + S (CI) 

We denote by AB the matrix multiplication of matrices 
A and B, and hy A ■ B the dot multiplication. A is 
the logical inverse of matrix A, where the elements 
of A are the 1 of A and vice-versa. A' is the transpose 
matrix of A. A summation denotes summation of all the 
matrix indices. The matrix formulas for the 13 directed 
connected 3-node subgraphs are given in Table IIIII For 
example id38 has two nodes which are connected by a 
path of 2 edges and a path of one edge. A'^ij is the 
number of length 2 paths between node i and node j. 
Dot-multiplication with matrix A and summation of the 
terms of the resultant matrix gives the correct count. 
In some of the subgraphs a correction is made for the 
terms on the diagonal (id6,id36,id78). 
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